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Abstract 

Background: Technological advances have enabled the widespread use of video cases via web-streaming and 
online download as an educational medium. The use of real subjects to demonstrate acute pathology should aid 
the education of health care professionals. However, the methodology by which this effect may be tested is not 
clear. 

Methods: We undertook a literature review of major databases, found relevant articles relevant to using patient 
video cases as educational interventions, extracted the methodologies used and assessed these methods for 
internal and construct validity. 

Results: A review of 2532 abstracts revealed 23 studies meeting the inclusion criteria and a final review of 18 of 
relevance. Medical students were the most commonly studied group (10 articles) with a spread of learner 
satisfaction, knowledge and behaviour tested. Only two of the studies fulfilled defined criteria on achieving internal 
and construct validity. The heterogeneity of articles meant it was not possible to perform any meta-analysis. 

Conclusions: Previous studies have not well classified which facet of training or educational outcome the study is 
aiming to explore and had poor internal and construct validity. Future research should aim to validate a particular 
outcome measure, preferably by reproducing previous work rather than adopting new methods. In particular 
cognitive processing enhancement, demonstrated in a number of the medical student studies, should be tested at 
a postgraduate level. 
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Background 

There are a plethora of educational programmes and im- 
plementation strategies aimed at improving the quality 
of care delivered by health care professionals. A number 
of these are delivered via information technology sys- 
tems with the use of video as an educational medium 
well established [1-3]. A new educational tool, that has 
become possible through multimedia advances in the 
last decade, is the audio-visual demonstration of signs 
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and symptoms in patients, referred to as Patient Video 
Cases or PVCs [4]. They are easily displayed via online 
platforms, are widely used, and have been endorsed by 
the National Patient Safety Agency [5] as an example of 
good practice. However there is little academic study of 
their effectiveness. Given the financial pressures affecting 
all health care agencies, it is important to know if these 
resource intensive e-learning strategies give demon- 
strable benefit to patients or health care professionals. 

Theoretical constructs exist to evaluate interventions 
designed to improve clinical performance, but no single 
approach is followed, due to the wide range of individual 
and organisational factors that affect the outcomes be- 
fore, during and after the intervention [6]. Kirkpatrick's 
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training evaluation is defined by four distinct levels of 
outcome to be approached in a stepwise fashion [7]. The 
four key domains of the Kirkpatrick model are learner 
satisfaction, learner knowledge, learner behaviour change 
and organisational change. Although others have argued 
contextual factors not classified under these domains may 
be significant [6], the Kirkpatrick model still remains a 
valid methodology with systematic reviews using the 
process to examine training effectiveness [8]. A healthcare 
relevant modification of the Kirkpatrick model has been 
used in a study of inter-professional education in health 
and social care [9]. When using the Kirkpatrick model, or 
other relevant frameworks for assessing an educational or 
training intervention, the outcome measures and the 
methodology by which they are obtained must be valid. 
The concepts of internal and construct validity are classifi- 
cations with direct relevance to outcome measures and 
are components of methodological quality used by the 
Campbell Collaboration [10,11]. 

i. Internal Validity is the extent to which the 
intervention can reliably be ascribed to have affected 
the change 

ii. Construct Validity relates to the association between 
the concept being investigated and the measures used 
to test it i.e. does the data collected accurately reflect 
the outcome measure chosen? 

Other forms of validity exist but are not directly relevant 
to the quality of the outcome measures chosen, for example 
good external validity would imply that using PVCs could 
be beneficial in different populations, but would not give 
any information if the initial outcome measure was fit 
for purpose. 

The aim of this work is to answer the question "What is 
the validity and quality of outcome measures that have 
been used to evaluate interventions based on PVCs?". This 
literature review will be used to identify which outcome 
measures are most valid in the assessment of the clinical ef- 
fectiveness of an intervention based on PVCs. It will also 
help identify areas where more methodological research is 
needed to enable future studies to demonstrate high 
internal and construct validity. 

Methods 

This review was performed over three stages, the first 
stage collating relevant literature followed by individual 
study quality appraisal in stage two with a summation of 
the overall validity of the studies. 

Stage one 

Stage one identified literature relevant to the use of PVCs 
in health care settings. The definition of Health Care Set- 
tings used was; 'any location or environment where 



students or graduates are practising or learning medicine.' 
The definition of a PVC was; 'any pre-recorded or live video 
footage of a patient used for the purposes of demonstrating 
a sign or symptom'. It did not include footage recorded for 
the purposes of educating other patients or families. Inclu- 
sion criteria were: 

i. Humans 

ii. The study described the use of PVCs in a training, 
educational (undergraduate or postgraduate), 
implementation capacity or environment. 

As PVCs relate to demonstration of signs and symptoms 
in patients, studies using video to demonstrate verbal 
communication, non-lexical utterances or solely history 
taking between a patient and doctor or patient and patient 
were excluded as were non-English language papers which 
could not be translated. The full literature search was 
developed in conjunction with a senior NHS Librarian 
and is available on request. The following general search 
terms were used (Video* OR Video record* OR video clip 
OR digital* record* OR analogue recording OR patient 
video clip) and (Educat* OR Train* OR learn* OR teach* 
OR inservice training). The following databases were 
searched: Medline, British Nursing Index (BNI), EMBASE, 
Health Management Information Consortium (HMIC), 
CINAHL, NIHR Health Technology Assessment 
Programme (HTA), Database of Abstracts of Reviews of 
Effects (DARE), Scopus, The Cochrane Library and the 
Education Resources Information Centre (ERIC). Internet 
search engines and NHS evidence were used to identify 
publications or articles related to the search terms. The 
search strategy was not limited to any particular research 
methodology used in the articles. The last search per- 
formed was 27th th July 2012 by the principal author. In all 
phases of the study any uncertainty as to classification or 
indexing of information was discussed with the collaborat- 
ing authors. 

Articles with a relevant abstract (any detail relating to 
the recording and utilisation of video clips of patients) 
had a complete paper review (as did any abstracts in 
which there was uncertainty about inclusion potential). In- 
formation on aim, health care user, educational purpose, 
modified Kirkpatrick training level domain, type of study, 
outcome measure and conclusions was extracted from 
each paper as shown in Table 1. The Educational purpose 
was subdivided into three categories: 

Stage two 

To enable objective review of articles to determine the 
aspects of validity under study the following domains 
were used which represent features reducing the internal 
validity of studies. They have been amended from the list 
described by Farrington [12]. This work was chosen as it 
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Table 1 Studies by Healthcare professional grouping 



Health care professional 



Number of studies pertaining to that group 



Undergraduates 

Basic Postgraduate Training 

Specialist Postgraduate Training 

Undergraduate professionals allied to Medicine 

Professional allied to Medicine 

Trained Doctor Continuing Professional Development 

Veterinary Students 

Total 



11 

3 
1 

2 
1 

2 
1 

21 



is based on Cook and Cambell's original work on meth- 
odological quality. Although other methodologies of ana- 
lysis are available this is a widely used and accepted process 
which allows for an objective process to be applied. 

1. Selection: Does the outcome measure allow for 
control between groups? 

2. History: Does the outcome measure allow for the 
effects caused by some event occurring at the same 
time as the intervention? 

3. Maturation: Does the outcome measure allow for 
natural progression in learning and knowledge? 

4. Instrumentation: Is the outcome measure 
reproducible? 

5. Testing: Does the outcome measure itself affect the 
results? 

6. Differential attrition: Can the outcome measure 
control for differing numbers of participants in 
control or experimental groups (if present) or large 
drop out rates. 

The extraction of information was undertaken by the 
principal author. 

Stage three 

Once this process had occurred a number of more global 
questions were asked of each paper to determine whether 
the article's author had evaluated the outcome methods 
they had chosen and allow an assessment of the construct 
validity of the study. 

a) How was the choice of outcome measure justified? 

b) Did the choice determine the results the study aims 
to investigate? 

c) To what extent were the writers aware of the 
disadvantages as well as the advantages of the 
outcome measures chosen? 

d) How did they overcome the disadvantages? 



Results 

Figure 1 shows the flow of journals from the initial search 
to the final selection of articles. The types of healthcare 
professionals studied is demonstrated in Table 1 and the 
number of studies classified by educational purpose and 
Kirkpatrick level shown in Table 2. Two studies evaluated 
both undergraduate and basic postgraduate trainees leading 
to a total of 21 studies of health care professional groups 
and two studies evaluated both learner knowledge and 
learner behaviour leading to a total of 20 studies of the rele- 
vant Kirkpatrick level. 

The purpose of this work was to be as inclusive as 
possible so as to capture all outcome measures used. Al- 
though twenty-two articles (twenty-three studies) under- 
went a thorough analysis in stage two, half of these 
require further clarification as to the reasons for their in- 
clusion. These articles were all reviewed by all three 
authors and a collaborative decision reached on their in- 
clusion. Under the inclusion criteria it had not been the 
intention to include animal studies in the protocol. 
However one, in the field of veterinary medicine [13], 
studied PVCs in precisely the context human patients 
clips would be used with an accompanying relevant and 
feasible methodology. It has been included in the final 
review as it was decided methodology rather than con- 
text was being investigated. The search was repeated re- 
moving the 'human only' limitation but no other 
veterinary journals of relevance were found. 

One study examining an intervention to improve the 
physical examination component of a medical student 
exam via a web-based video did not specifically use ab- 
normal or normal clinical signs [14]. The study looked 
at outcomes across a whole year group in a before and 
after cohort design. This study has been included as the 
methodology could have been easily used in a PVC- 
related intervention. A study using video to demonstrate 
a specific clinical examination was also included al- 
though it could be argued that the precise aim of the 
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Figure 1 Literature Search Flow Diagram. 



tool was not demonstrating specific clinical signs but a 
methodology of elucidating them. The methodology 
used, a Solomon four-group design [15], was considered 
relevant to defining robust outcome measures in future 
PVC studies. 

Finally six studies [16-21], although in different pa- 
tient groups (ankylosing spondylitis, rheumatoid 
arthritis, fibromyalgia) used exactly the same meth- 
odology as two initial studies into osteoarthritis by 
the same investigators. These were studies in the 



validation of an examination methodology in both 
medical students and consultants. Although the ac- 
tual data was different, the papers used exactly the 
same introduction, methods and discussion. In terms 
of the narrative review, these eight journal articles 
represent only one methodological approach in two 
different cohorts of participants. It was felt due to 
the lack of difference in the wording of the arthritic 
publications these should be considered as two stud- 
ies, one representing undergraduates and the other 



Table 2 Classification of studies 



Educational purpose 


Number of studies 


Kirkpatrick level 


Number of studies 


Knowledge Gain 


8 


Learner Reaction (level 1) 


5 


Testing Methods 


3 


Learner Knowledge (2a) 


0 


Patient Examination Skills 


5 


Learner Knowledge (2b) 


8 


Overall Clinical Care 


2 


Learner Behaviour 


7 


Total 


18 


Total 


20 
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Table 3 Identification of health care settings in which educational patient video clips have been utilised 



Paper 



Aim or hypothesis Health care 
user 



Educational 
purpose 



Kirkpatrick Type of 
level study 



Outcome 
methodology 



Conclusions 



Using interactive 
video to add 
physical assessment 
data to computer 
based patient 
simulations [22] 



Interactive video in 
patient simulations 
improves the 
learning experience 
[Not formally Stated 
in paper] 



The development of 
shared cognition in 
paediatric residents 
analysing a patient 
video versus a 
paper patient case 
[23] 



Enhancing 
diagnostic accuracy 
among non-experts 
through use of 
video cases [24] 



ntroduction of 
patient video clips 
into computer- 
based testing: 
Effects on item 
statistics and 
reliability estimates 
[25] 

Video-based test 
questions: A novel 
means of evaluation 
[26] 



To compare the 
basic characteristics 
and reliability of 
questions using 
video-based 
vignettes to 
questions using 
analogous text- 
based vignettes. 

Video-Based Test 
items are supported 
by students [Not 
formally stated by 
paper] 



Basic 

Postgraduate 
Training 



Overall 
Clinical Care 



Learner 
Behaviour 
(level 3) 



supplementing 
a written case 
vignette by a PVC 
as opposed to an 
equivalent paper 
case increase shared 
cognition as 
measured by the 
frequency of 
collaborative 
concept link 
formation in the 
context of 
collaborative 
learning about 
movement 
disorders. . ..? 

(1) How does the 
level of diagnostic 
accuracy evolve 
through an 
interactive 
teamwork approach 
using PVCs? 

(2) Does the level of 
diagnostic accuracy 
differ between non- 
experts and 
experts? 



Medical 
Students 



Knowledge 
Gain 



Leaner 
Knowledge 
(level 2b) 



Specialist 

Postgraduate 

Training 



Knowledge 
Gain 



Medical 
Students 



Testing 
Methods 



Learner 
Knowledge 
(level 2b) 



Undergraduate 
professionals 
allied to 
Medicine 



Testing 
Methods 



Learner 
reaction 
(level 1) 



Comparative 



Cluster 
Randomised 
Control Trial 



Learner Repeated 
Knowledge Measures 
(level 2b) Design 



Parallel test 
questions 
with cross- 
over of video 
and text 
descriptors. 



(1) 

Questionnaire 

utilising 

repeated 

measures 

ANOVAs 



Chart Review of 
interview and 
physical 
examination 
description of 
actual patients 
versus simulation 
performance 

Identification and 
frequency of 
collaborative and 
individual concept 
inks 



Analysis of 
frequency and of 
new diagnoses 
and new clinical 
reasoning 
processes as new 
information 
presented during 
review of PVC 
case. 



Good to fair 
agreement on 
overall comparison 
(kappa=0.72) and 
physical 
examinations 
(kappa=0.7) 

The video group 
showed a 
significant increase 
(p<0.01) in 
collaborative 
concept links but 
not in individual 
concept links after 
watching the video 



Median Item 
statistics and 
reliability estimates 
for test items 



Students 
preference 
between video- 
based and 
multiple choice 
questions 



i) New clinical 
reasoning processes 
were most frequent 
at first review of the 
PVC. 

ii) Frequency of 
new relevant 
diagnoses were 
stable at each step 
whereas less 
relevant diagnoses 
decreased. 

iii) Relevant clinical 
reasoning was 
significantly higher 
amongst non- 
experts compared 
with experts at the 
small group 
discussion and think 
aloud procedure 
with content expert. 

Overall, video-based 
questions had 
comparable 
difficulty and 
discrimination 
compared to 
analogous text- 
based questions. 

Students thought 
video based 
questions deepened 
understanding and 
recommended 
video-based 
questions be used 
in future exams. 
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Table 3 Identification of health care settings in which educational patient video clips have been utilised (Continued) 



Video-based test 
questions: A novel 
means of evaluation 
[26] 



Unclear 



Use of animation- 
enhanced video 
clips for teaching 
abnormal breathing 
patterns [13] 



Using web-based 
video to enhance 
physical 

examination skills in 
medical students 
[14] 



Teaching the 
plantar reflex [15] 



Undergraduate Testing 
professionals Methods 
allied to 
Medicine 



A comparison of 
critical thinking in 
groups of third-year 
medical students in 
text, video, and 
virtual PBL case 
modalities [27] 



Comparison of text 
and video cases in a 
postgraduate 
problem-based 
learning format [28] 



Critical Thinking, as 
exemplified by the 
discourse among 
students during 
group discussion, 
differs among 
groups receiving 
the same case with 
the same facilitator 
in one of three 
formats. 

The addition of a 
video case to 
written information 
would lead to a 
greater increase in 
the frequency of 
data exploration, 
theory building and 
evaluation and 
metareasoning than 
would be a 
achieved by a paper 
case. 

To gather feedback 
regarding the 
Animated Breathing 
Pattern Videotape 



To measure 
changes in first year 
students' 
performance of 
physical 

examinations on 
standardized [sic] 
patients after 
implementation of a 
web- based 
curriculum 



Medical 
Students 



Medical 
Students 



3'" year 
Veterinary 

Students, 
House Officers 
and Faculty, 
Qualified 
Veterinarians 

Medical 
Students 



To test to efficacy of Medical 
video-tape in the Students 
evaluation of the 
planter response 



Knowledge 
Gain 



Knowledge 
Gain 



Patient 

Examination 

Skills 



Patient 

Examination 

Skills 



Patient 

Examination 

Skills 



Learner 
Knowledge 
(level 2b) 



Exam scores in 1 2 
video-based items 



Learner 
Knowledge 
(level 2b) 



Learner 
Knowledge 
(level 2b) 



Leaner 
reaction 
(level 1) 



Learner 
Behaviour 
(level 3) 



Learner 
Behaviour 
(level 3) 



(2) 

Observational 
comparative 
study (One 
group 
informed 
about video 
questions the 
other not) 

Three way Critical thinking 



comparative 
study (face-2- 
face with text, 
face-2-face 
with video, 
virtual with 
video) 



Randomised 
comparative 
study of video 
versus text 
cases 



discourse analysis 



Frequency of pre- 
defined clause 
categories 



Questionnaire 



Usefulness and 
Satisfaction 



Students informed 
about video clips 
correctly answered 
more video based 
items 



The virtual groups 
had the highest 
critical-thinking 
ratio. Except for the 
problem- 
identification stage, 
the video groups 
had higher ratios 
that the text groups 
did. 



The verbal 
interaction showed 
statistically 
significant 
improvements in 
data exploration, 
theory building and 
theory evaluation 
after the video case 



Uniformly positive 
responses 



Before and 
After Cohort 
Outcome 
study 



(i) Percent correct Students on Web- 



score in physical 
exam item 
checklist 

(ii)Mean score on 
physical exam 
process instrument 



based curriculum 
had higher level of 
competency and 
reduction in poor 
performance levels 



Solomon Four Correct judgement Small non- 



Group Design 
- Two 

experimental 
and control 
groups (with 
and without 
entrance test) 



of graded 
presence of clinical 
sign 



significant 
difference between 
experimental and 
control groups 
[evidence of 
sampling error]. If 
analysis was 
restricted to 
students who 
performed an 
entrance test there 
was a statistical 
significance in 
favour of the video 
group. 
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Table 3 Identification of health care settings in which educational patient video clips have been utilised (Continued) 



A videotape-based 
training method for 
improving the 
detection of 
depression in 
residents of long- 
term care facilities 
[29] 



Advantages of 
video trigger in 
problem-base 
learning [30] 



A triangulated 
approach to the 
assessment of 
teaching in 
childhood epilepsy 
[31] 



How video cases 
should be used as 
authentic stimuli in 
problem-based 
medical education 
[32] 

Visual expertise in 
paediatric 
neurology [33] 



An evaluation of 
the effectiveness of 
a videotape 
programme on 
inter-observer 
reliability in 
outcome 
assessment for 
osteoarthritis [34] 



An evaluation of 
the effectiveness of 
a videotape 
programme on 
inter-observer 
reliability in 
outcome 
assessment for 
fibromyalgia [16] 



Does a training 
programme 
involving video 
based scenarios 
improve nursing 
staffs' detection of 
depression within 
long-term care 
facilities? [Not 
formally stated by 
the paper] 

The reasons behind 
preferences for 
video triggers or 
paper cases in 
students and 
facilitators who are 
accustomed to 
paper cases. 

Evaluation of 
participant 
perceptions of 
learning 



To examine 
students views on 
the value of video 
cases compared to 
text based cases. 



Professionals 
allied to 
Medicine 



Overall 
Clinical Care 



Medical 
Students 



Knowledge 
Gain 



Medical 
Student s and 
Basic 

Postgraduate 
Training 



Medical 
Students 



Knowledge 
Gain 



To investigate visual 
attention and 
cognitive processes 
of clinicians of 
varying degrees of 
experience 
diagnosing 
authentic paediatric 
video case 

Whether 
interobserver 
variability in senior 
medical students 
could be reduced in 
a group of patients 
with OA using only 
a single viewing of 
an instructional 
videotape. 



Whether 
interobserver 
variability in senior 
medical students 
could be reduced in 
a group of patients 
with fibromyalgia 
using only a single 
viewing of an 



Medical 
Students, Basic 
Postgraduate 
Training and 
Consultant CPD 



Knowledge 
Gain 



Knowledge 
Gain 



Medical 
Students 



Patient 

Examinations 

Skills 



Medical 
Students 



Patient 

Examinations 

Skills 



Learner 
Knowledge 
(level 2b) 
and Learner 
Satisfaction 
(level 3) 



Parallel group (I) Videotape 



Learner 
Reaction 
(level 1) 



Learner 
Reaction 
(level 1) 



Learner 
Reaction 
(level 1) 



Learner 
Knowledge 
(level 2b) 
and 

Behaviour 

(Level 3) 



Learner 
Behaviour 
(level 3) 



Learner 
Behaviour 
(level 3) 



delayed 

intervention 

design. 



Questionnaire 



vignette test 

(ii) Written Test 

(iii) Course 
evaluation 
questionnaire 



Usefulness and 
Satisfaction 



Triangulation Participant 



Significant increase 
in performance in 
the intervention 
group which was 
maintained for at 
follow up for both 
vignette and 
written test. 

Good levels of 
satisfaction on 
questionnaire 

Video triggers were 
preferred by both 
students and 
facilitators over 
paper cases in 
Problem Based 
Learning 

Videos identified as 



Outcome 
Analysis 



assessment (rating the most useful and 



scales, open 
ended questions 
and focus groups), 
Lecturer reflection 
and peer 
observations 



Focus Groups Thematic Analysis 



Observational 
study 



Eye-tracking data 
were analysed 
with verba 
recordings. 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 



Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 



interesting teaching 
tool. Results cross- 
validated by lecturer 
and peer 
observations 

Video generally 
valuable but benefit 
dependant on 
certain conditions. 



More experienced 
clinicians were 
more accurate in 
visual diagnosis and 
spent more of their 
time looking at 
relevant areas 



Pre-standardization 
reliability 
coefficients were 
<0.80 for seven 
measures. 
Coefficients for the 
performance of 
knee goniometry 
were uniformly low. 
Following the 
intervention, all but 
four reliability 
coefficients were 
>/= 0.93. 

Pre-standardization 
reliability 
coefficients were 
<0.80 for 8 
measures. Following 
standardization al 
reliability 
coefficients, but 
one, approximated 
or exceeded 0.80 
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Table 3 Identification of health care settings in which educational patient video clips have been utilised (Continued) 



An evaluation of 
the effectiveness of 
a videotape 
programme on 
inter-observer 
reliability in 
outcome 
assessment for 
ankylosing 
spondylitis [18] 



instructional 
videotape. 

Whether 
interobserver 
variability in senior 
medical students 
could be reduced in 
a group of patients 
with ankylosing 
spondylitis using 
only a single 
viewing of an 
instructional 
videotape. 



An evaluation of 
the effectiveness of 
a videotape 
programme on 
inter-observer 
reliability in 
outcome 
assessment for 
rheumatoid arthritis 
[17] 

Osteoarthritis 
antirheumatic drug 
trials: Effects of a 
standardized 
instructional 
videotape on the 
reliability of 
observer-dependent 
dependent 
outcome measures 
[35] 

Fibromyalgia 
antirheumatic drug 
trials: Effects of a 
standardized 
instructional 
videotape on the 
reliability of 
observer-dependent 
outcome measures 
[19] 

Rheumatoid arthritis 
antirheumatic drug 
trials: Effects of a 
standardized 
instructional 
videotape on the 
reliability of 
observer-dependent 
outcome measures 
[20] 



Whether 
interobserver 
variability in senior 
medical students 
could be reduced in 
a group of patients 
with rheumatoid 
arthritis using only a 
single viewing of an 
instructional 
videotape. 

Whether 
interobserver 
variability in 
consultants could 
be reduced in a 
group of patients 
with OA using only 
a single viewing of 
an instructional 
videotape. 

Whether 
interobserver 
variability in 
consultants could 
be reduced in a 
group of patients 
with fibromyalgia 
using only a single 
viewing of an 
instructional 
videotape. 

Whether 
interobserver 
variability in 
consultants could 
be reduced in a 
group of patients 
with rheumatoid 
arthritis using only a 
single viewing of an 
instructional 
videotape. 



Medical 
Students 



Patient 

Examinations 

Skills 



Learner 
Behaviour 
(level 3) 



Medical 
Students 



Patient Learner 
Examinations Behaviour 
Skills (level 3) 



Consultant CPD 



Patient 

Examination 

Skills 



Learner 
Behaviour 
(level 3) 



Consultant CPD 



Patient 

Examinations 

Skills 



Learner 
Behaviour 
(level 3) 



Consultant CPD 



Patient 

Examinations 

Skills 



Learner 
Behaviour 
(level 3) 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Before and 

After Study 

[Video 

intervention 

poorly 

described] 



Reliability 
coeffecients for 
the group 

Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 



Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 

Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 

Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 

Change in mean 
values of 
previously 
described observer 
dependant 
measures per 
participant 

Reliability 
coeffecients for 
the group 



Pre-standardization 
reliability 

coefficients were < 
0.80 for three 
measures. Following 
standardization 12 
reliability 
coefficients 
exceeded 0.80. For 
the majority of 
measures pre- 
standardization 
reliability 
coefficients were 
high and no further 
improvement in 
reliability could be 
demonstrated 

Pre-standardization 
reliability 
coefficients were 
>0.80 for all 
measures and 
remained above 
0.80 following 
standardization 
except for one 
measure 

Prestandardization 
reliability 
coefficients were 
>0.80 for all 
measures and 
remained above 
0.80 following the 
intervention 



Prestandardization 
reliability 
coefficients were 
<0.80 for 8 
measures. Following 
standardization al 
reliability 
coefficients 
approximated to or 
exceeded 0.80. 

Prestandardization 
reliability 
coefficients were 
>0.80 for all 
measures and 
remained above 
0.80 following 
standardization 



Consultant CPD 
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Table 3 Identification of health care settings in which educational patient video clips have been utilised (Continued) 



Ankylosing 


Whether 


Patient 


Learner 


Before and 


Change in mean 


Prestandardization 


spondylitis 


interobserver 


Examinations 


Behaviour 


After Study 


values of 


reliability 


antirheumatic drug 


variability in 


Skills 


(level 3) 


[Video 


previously 


coefficients were 


trials: Effects of a 


consultants could 






intervention 


described observer 


<0.80 for three 


standardized 


be reduced in a 






poorly 


dependant 


measures. Following 


instructional 


group of patients 






described] 


measures per 


standardization 12 


viddeotape on the 


with ankylosing 








participant 


reliability 


reliability of 
observer-dependent 
outcome measures 
[21] 


spondylitis using 
only a single 
viewing of an 
instructional 
videotape. 








Reliability 
coeffecients for 
the group 


coefficients 
exceeded 0.80 



trained doctors continuing professional development. 
Noting the reasons given above the total number of 
articles evaluated was 17 (which involved 18 distinct 
studies). 

Table 3 contains the descriptive results for the reviewed 
articles and Table 4 contains the overall judgement on each 
of the articles. The analysis of the validity of the outcome 
measures can be found in the Additional file 1: Appendix. 

Discussion 

This review examined the evidence on how to measure 
outcomes when Patient Video Cases (PVCs) are used in 
healthcare settings. This evidence was small, extremely 
heterogeneous and there was insufficient evidence to 
specify the best outcomes to use. The heterogeneity in 
the articles was created by the diversity of involved 
health care professionals, varying educational purposes, 
different types of intervention, a wide range of outcome 
methodologies, different internal and construct validities 
and a variety of results. Each of these is examined in 
turn. 

Type of healthcare professional 

The preponderance of projects in undergraduate educa- 
tion is likely related to the large number of medical edu- 
cation academics at these institutions, the access to a 
'captive group' of subjects and the greater ease of asses- 
sing undergraduate outcomes. Further investigation into 
the use of PVCs at postgraduate level and in other 
healthcare professionals is clearly warranted. For all 
health care professionals it is also reasonable to attribute 
the lack of studies to the difficulties in designing [36] 
and funding studies evaluating PVCs. 

Educational purposes and types of intervention 

Given the small number of studies, it is difficult to iden- 
tify clear treads in educational purpose or type of inter- 
vention. Learner satisfaction and knowledge gain are the 
easiest of the Kirkpatrick training outcomes to measure 
as they do not require external observation or interven- 
tion. However these domains are the lowest in the hier- 
archy of evidence needed to confirm that a training 



process has been truly effective [37]. No study looked at 
organisational change, which is in keeping with previous 
literature. A review aiming to identify methods used to 
measure change in the clinical practices of health profes- 
sionals found only 17.6% looked at changes at an organ- 
isation level [38]. Also in this review only one study 
attempted to look at more than one level of training out- 
come. A systematic review of evaluation in formal con- 
tinuing medical education [39] noted 28% of studies 
reviewed looked at two levels and only 6% looked at 
three. 

Methods for determining and assessing outcome 
measures 

Reflecting the wide range of different types of studies 
performed, the validity of the outcome measures used 
was variable. This represents the difficulties of examin- 
ing interventions related to education and training. In 
clinical practice a gold standard approach in assessing 
the effectiveness of medication is the randomised con- 
trolled trial. The primary outcome measure being an ob- 
jective endpoint such as a defined reduction or gain in a 
physiological parameter. In training interventions, a sin- 
gle endpoint as an outcome requires a lot of interpret- 
ation, and potential criticism. For example, learner 
satisfaction does not necessarily equate to knowledge 
change, neither does it have a direct correlation with 
change in practice. The absence of a gold standard 
measure to assess training interventions may have led 
researchers to be opportunistic in their use of outcome 
measures. In this review seven studies gave no justifica- 
tion for the outcome measure used [13,15,25,26,29,30]. 
In addition comments by the authors themselves on lim- 
itations to the outcome measures were absent in five of 
the studies [13,26,30,31]. 

Only one study looked at more than one discrete do- 
main in the Kirkpatrick training evaluation framework 
[29]. In this work both learner knowledge and learner 
satisfaction were assessed by different measures (a video 
test, a written test and a course evaluation). Three other 
studies [14,25,31] had more than one outcome measure, 
although these were all subtle variations on a theme 
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Table 4 Review of methodological quality of studies using outcome measures to assess the impact of PVCs 



Paper 


1 


2 


3 


4 








How is the choice of 


Will this choice 


To what extent are 


How do they 


Internal 


Construct 




outcome measure 


determine the results 


the writers aware of 


overcome the 


validity? 


validity? 




justified? 


the study aims to 


the disadvantages as 


disadvantages? 










investigate? 


well as the 














advantages of the 














outcome measures 














chosen? 









Using interactive video 
to add physical 
assessment data to 
computer based 
patient simulations [22] 



The development of 
shared cognition in 
paediatric residents 
analysing a patient 
video versus a paper 
patient case [23] 



Enhancing diagnostic 
accuracy among non- 
experts through use of 
video cases [24] 



ntroduction of patient 
video clips into 
computer-based 
testing: Effects on item 
statistics and reliability 
estimates [25] 

Video-based test 
questions: A novel 
means of evaluation 
[26] 



Used to justify criterion 
validity of the 
intervention used. 



Process of capturing 
concept link formation 
described with the 
reason for using verba 
protocol analysis 
supported by 
published evidence. 

Methodology of using 
simulated recall in 
individuals following 
the group discussions 
not well supported. 



Previous work by the 
author has shown 
improved cognitive 
processes when PVCs 
utilised. Improved 
diagnostic accuracy is 
the natural conclusion 
of relevant or 
improved cognitive 
processing. 



No justification for 
answer analysis is 
given although 
standard methodology 
applied. 

No justification for 
questionnaire 
methodology given 
although this format is 
an accepted primary 
approach to gathering 
information on 
satisfaction with a 



Study aims not clear. If 
presumed to be to 
elucidate whether the 
simulations are an 
effective learning 
experience the 
outcome used partially 
confirms the simulation 
represent norma 
practice not that the 
simulation improved 
performance or was an 
useful education tool 

Yes 



Yes 



Yes 



Yes although reliability 
of results must be 
treated with caution. 



Author notes that in 
actual practice chart 
noting is done under 
time pressure whereas 
with this simulation 
there was more time 
available to make 
case-notes more 
complete. 



No comment made on 
this. The absence of 
details on whether a 
specific proforma for 
extracting information 
from the case-notes 
was used makes it 
difficult to assess how 
comparisons were 
made. 



No 



No 



Authors note a small 
number of participants 
and only one case so 
reliability may be 
questioned. 

They also note the 
ability to accurately 
recall and record all 
concept links is not 
established. 

They were aware the 
interviewers were not 
blinded to the 
intervention group of 
the participants. 

Study acknowledges 
the diagnostic 
accuracy as an 
outcome is only a 
short term variable of 
learning. 

Authors touch on, but 
don't specifically note, 
the outcome measure 
is not directly related 
to the intervention 
rather the group 
discussion following 
the intervention. 

Problems with 
questions with low 
discrimination values 
identified and subject 
to supplemental 
analysis. 

No comments made No comments made 



The positioning of the 
simulated recall 
exercise straight after 
the group work limits 
loss of content due to 
degradation of 
memories. 

Authors comment 
although not blinded 
the interviewers were 
not recording 
cognitive processes 
just the thoughts that 
lead to them. 



Authors argue increase 
in diagnostic reasoning 
in non-experts in may 
promote further 
iterature study and 
learning. 



Items with RPB values 
of zero or less than 0.2 
removed although no 
explanation of why 
these values were 
chosen. 



Yes 



Yes 



Yes 



Yes 



Not 

Applicable 



Not 

Applicable 



Not Not 
Applicable Applicable 



process. 
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Table 4 Review of methodological quality of studies using outcome measures to assess the impact of PVCs (Continued) 



Video-based test 
questions: A novel 
means of evaluation 
[26] 

A comparison of 
critical thinking in 
groups of third-year 
medical students in 
text, video, and virtual 
PBL case modalities 
[27] 



Comparison of text 
and video cases in a 
postgraduate problem- 
based learning format 
[28] 



Use of animation- 
enhanced video clips 
for teaching abnormal 
breathing patterns [13] 



Using web-based 
video to enhance 
physical examination 
skills in medica 
students [14] 



Teaching the plantar 
reflex [1 5] 



A videotape-based 
training method for 
improving the 
detection of 
depression in residents 



No justification given 



There is a theoretical 
association between 
Problem Based 
Learning and critical 
thinking. A sound 
research framework 
exists to analyse 
discourse and code for 
content. Therefore a 
process, discourse 
analysis , exists to 
examine the outcomes 
of PBL in respect to 
critical thinking. 



A coding system for 
cognitive and 
metacognitive thinking 
has been established. 
It is theoretically 
plausible a video case 
would improve 
thinking processes. 



No justification given 
for questionnaire 
methodology although 
this format is an 
accepted primary 
approach to gathering 
information on 
satisfaction with a 
process. 



Summative clinical 
skills assessment has 
been utilised (and 
presumably validated 
although this is not 
stated) in the learning 
institution the study is 
taking place in. 

No justification given. 
Process for assessing 
performance described 
although the 
standardised rating 
scale was not 
demonstrated. 

Outcome measures 
well described but no 
comment on the 
reason for using them. 



Aims of the study not 
clear 



Yes 



No comments made No comments made 



Not 

Applicable 



Yes. Although the 
actual reason for 
improved educational 
outcome in PBL has 
yet to be defined and 
the coding schema 
chosen is only one 
way of evaluating 
cognitive and 
metacognitive 
processes. 

Yes although the 
reliability of results 
must be treated with 
caution. 



The outcome measure Only one author did all Yes 



Yes but only if the 
assessment system is 
valid. 



As patient outcome 
not measured 
methodology can only 
assess how the 
training programme 
improves performance 



Not 

Applicable 



Yes 



Yes, although 
dependant on the 
reliability of the Clinical 
Skills Assessment. 



depends on the 
validity and reliability 
of the coder. 

Although the coder is 
blind to group type it 
is possible for this to 
be suggested by the 
dialogue. 

Time pressures on 
face-to-face groups 
may limit 

opportunities to refine 
critical thinking 
compared to the 
virtual groups. 

The outcome measure 
itself is not examined 
although the article 
notes the use of one 
author for both groups 
and the low numbers 
of residents in both 
groups mean caution 
is required in 
interpretation. 

No comment made. 
The authors note that 
some respondents 
gave inconsistent 
written comments 
when compared with 
their agreement with 
statements and 
comment it was likely 
they had mis- 
understood the 
question. 

The authors comment 
they did not track the 
utilisation of the video 
clip website by 
students and note a 
prospective, 
randomised controlled 
study would have 
been more accurate. 

The authors 
acknowledge the 
effect of the entrance 
test in providing 
education in itself 



Authors not a large 
sample size is needed 
to counter affects of 
attrition. 



the coding but coding 
agreement was 
checked using a 
sample of transcripts 
with two others (one 
not involved in the 
study). 



The use of clause 
frequency enables 
variability in group 
sizes to be addressed. 



Yes 



Yes 



No comments made Yes 



No 



No comments made 



Potentially 
Not 



Yes 



No comment made 



Potentially Yes 
Not 



Authors tested after a 
control period and 
withheld feedback to 
participants about their 
test performances. 



Potentially 
Not 



Yes 
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Table 4 Review of methodological quality of studies using outcome measures to assess the impact of PVCs (Continued) 



of long-term care 
facilities [29] 



Advantages of video 
trigger in problem- 
base learning [30] 



A triangulated 
approach to the 
assessment of teaching 
in childhood epilepsy 
[31] 



How video cases 
should be used as 
authentic stimuli in 
problem-based 
medical education [32] 



Visual expertise in 
paediatric neurology 
[33] 



An evaluation of the 
effectiveness of a 
videotape programme 
on inter-observer 
reliability in outcome 
assessment for 
osteoarthritis [34] 

Osteoarthritis 
antirheumatic drug 
trials: Effects of a 
standardized 
instructional videotape 
on the reliability of 
observer-dependent 
dependent outcome 
measures [35] 



in the outcomes 
tested. 



No justification for 
questionnaire 
methodology given 
although this format is 
an accepted primary 
approach to gathering 
information on 
satisfaction with a 
process. 

Triangulation used as a 
more complete and 
robust measure to 
validate findings. 



Yes as long as sample 
of participants valid. 



Focus groups a well 
refined qualitative tool 
which all deep analysis 
of concepts presented. 



Important differences 
in perception between 
experts and novices 
studying dynamic 
stimuli has been 
documented . Authors 
note this field is 
underexplored in the 
medical domain but 
use a high quality eye 
tracking machine and 
linked to spoken 
cognitive processes 

Outcome measure 
used in previous 
studies to assess 
performance in 
musculo-skeletal 
examination. 

Outcome measure 
used in previous 
studies to assess 
performance in 
musculo-skeletal 
examination. 



Yes although must be 
employed in a 
methodological 
fashion. However the 
use of video clips was 
not the sole purpose 
of the study and 
questions not posed 
to determine this. 

Yes 



Yes 



Yes 



Yes 



They also not the 
vignette video test 
may improve practice 
in its own right. 

Staff were allowed to 
choose timing and 
type of session 
according to their 
needs with no control 
for group or individual 
sessions. 

No comments made No comments made 



Yes 



Yes 



Very little attention 
paid to confounding 
influences and the fact 
that the cross - 
validation was not 
particular well 
demonstrated. 



Clear 

acknowledgement of 
the problems with 
individuals dominating 
or evading group 
discussion. 

Note that outcome 
method was novel 
and made efforts to 
triangulate findings to 
gold standard 
outcomes (such as 
correct diagnosis) 



A larger matrix and 
more observers and 
patients may have 
been used to improve 
reliability. 



A larger matrix and 
more observers and 
patients may have 
been used to improve 
reliability. 



No comments made 



I lo 



Not 

applicable 



Very experienced 
facilitator used 



Used variety of 
experience in subject 
population 



Yes 



Yes 



Yes 



Yes 



The authors own 
previous work has 
indicated the 6x6 is 
pragmatic and 
representative. 



The authors own 
previous work has 
indicated the 6x6 is 
pragmatic and 
representative. 



I lo 



Yes 



No 



Yes 



Roland et al. BMC Medical Education 2012, 12:125 
http://www.biomedcentral.com/1472-6920/12/125 



Page 13 of 15 



such as scores in different types of clinical examination 
in the same test. 

Only two of the studies [27,33] satisfied all domains 
when deciding on whether internal and construct validity 
had been achieved. Three other papers [15,23,29] had 
minor concerns, generally relating to the extent which the 
outcome measure itself affected the results. Questionnaire 
studies reflecting learner satisfaction tended not to perform 
well as control between groups was not possible and 
confounding factors were very difficult to assess. 

Results of the interventions 

Nearly all papers were positive regarding the use of 
PVCs (regardless of whether the analysis above had 
revealed concerns over the validity of the outcome 
measure). The medical student studies regarding critical 
analysis and thinking showed strong results in favour of 
the use of PVCs. The underlying hypotheses of these 
studies [23,24,27,28,32] were plausible and the method- 
ologies used rigorous. A researcher independent of these 
groups has also recently shown students prefer this use 
of PVCs to current problem based learning techniques 
[30] so triangulation has in some respects been achieved 
in this field. A recent paper demonstrating experts are 
more focused on the relevant clinical features within 
patient video clips has been further supported by, as yet 
unpublished evidence, that eye movement modelling 
may improve diagnostic reasoning. This methodology, 
where the minute movements of the eye are tracked 
while observing dynamic images, has strong construct 
validity. It is felt the cognitive 'load' of dynamic video 
clips may encourage cognitive processing [40] and therefore 
methodologies to explore the extent of this load created by 
PVCs are welcome. Future research must be cognisant of 
the fact that under- or over-load may occur depending on 
the capacity of the individual engaging in the activity. Extra- 
neous cognitive [41] load may be able to be controlled to 
some extent by investigators and this will aid determination 
of its impact on the outcome of the intervention. 

Studies concerning testing methods and clinical exam- 
ination showed no obvious differences between PVCs 
and current assessment methods. The potential difficulty 
and cost of placing video clips into examinations 
(whether formative or summative) may have limited the 
number of validation studies in this area. In studies of 
clinical examination technique which aimed to show 
improvement following a PVC intervention, there was 
supportive evidence although initial skill sets tended to 
be relatively high. The importance of controlling for this 
was demonstrated by the use of the Solomon Four 
Group design on a video intervention to improve exam- 
ination of the plantar reflex [15]. In this study an effect 
was only seen when pre-intervention performance 
was assessed. 



The video-based training method for improving the 
detection of depression in residents of long term care 
facilities demonstrated an increase in performance of the 
intervention group in both knowledge assessments [29]. 
Direct patient benefit was not assessed so an improvement 
in clinical care as a use of PVC cannot be claimed. How- 
ever given the good levels of satisfaction on questionnaire 
testing it is likely that participants would not have been 
averse to incorporating newly acquired learning into their 
day-to-day practice. 

Limitations 

The heterogeneity of the current published evidence made 
a robust narrative review extremely difficult. Apart from 
the work on how PVCs encourage discourse and critical 
thinking, there were no common themes in which to be 
able to extract information and analyse composite out- 
comes. This may represent difficulty in undertaking re- 
search in the field (the cost of production of video clips), 
the difficulty in defining valid outcome measures or publi- 
cation bias due to a paucity of positive outcomes. This 
exemplifies the challenge that much medical education 
research is Action Research, research based on the instruc- 
tors' own practice. 

Publication bias is unlikely to be significant as there as 
there is literature in which research is positive [42] 
regarding the use of video and online technologies but 
there are also negative [43] publications in existence. It 
would seem unlikely a particular modality of online or 
audiovisual learning would be subject to a different 
research agenda. 

The main limitation of this study is the low number of 
articles that were found. The search strategy used was 
expansive although "Patient Video Clip" or similar terms 
are not used by all researchers in the field. It is possible 
terms other than those searched have been used 
although the number of papers missed is likely to be very 
small. Extraction of data was performed by a sole reviewer 
so it is possible so errors of typology were made although 
the small number of final articles has allowed extensive 
examination of the papers by all the authors. 

Conclusion 

This review process has demonstrated the diverse nature of 
research in determining the effectiveness of PVCs in educa- 
tion. Medical education occurs in a variety of environments 
and the complicated interplay of confounding variables 
makes interpretation of outcomes difficult. The following 
recommendations would enable the production of a stand- 
ard conceptual framework to guide future research in 
the area. 

• Studies should classify which facet of training or 
educational outcome the study is aiming to explore. 
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• Studies should aim to validate a particular outcome 
measure, preferably by reproducing previous work 
rather than adopting new methods. 

• A description of the validity of the chosen outcome 
measure should be included in study protocol. 

• Although control groups are useful for 
demonstrating the benefit of a PVC intervention, 
more evidence is needed on whether the outcome 
measure demonstrates construct validity. 

• Studies on PVCs should take account of cognitive 
theory with the cognitive processing enhancement, 
demonstrated in a number of the medical student 
papers, tested at a postgraduate level. Although 
pragmatic outcome measures are easier to achieve 
explanatory trials are needed. 

• Prior-knowledge and behaviour testing is vital to 
demonstrate improvement. 

Additional file 



Additional file 1: Appendix. Review of studies to assist in evaluation of 
internal and construct validity based on framework via Farringdon. 
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